xxxxxxxxxx### Table of contents:* 1: Key concepts in Geospatial data * 1.1: Raster and vector data * 1.2: Digital Elevation Models* 2: Interactive data visualization with * 2.1: Interactive data visualization with Folium * 2.2: Interactive mapping using Kepler * 2.3: DBSCAN and its application to trajectories clustering* 3: Spatial data visualization challenges * 3.1: Space-time trajectory * 3.2: Chicago criminality * 3.3: LIDAR data visualization* 4: Spatial statistics * 4.1: Introduction to point pattern statistics * 4.2: Point pattern statistics with Preston Crime * 4.3: Areal statistics* 5: Geostatistics#import the required libraries!pip install srtm.py!pip install geopandas;!pip install pyshp;!pip install rasterio;!pip install keplergl!pip install movingpandas!pip install basemapfrom descartes import PolygonPatchimport shapefile as shpimport matplotlib.pyplot as pltimport geopandas as gpdimport fionaimport numpy as npimport rasterio as riofrom rasterio.plot import showimport pandas as pdimport fiona.transformimport rasterio.samplefrom scipy.spatial.distance import directed_hausdorfffrom sklearn.cluster import DBSCANimport movingpandas as mpdfrom datetime import datetime, timedeltaimport plotly.express as pxfrom mpl_toolkits.basemap import Basemapfrom mpl_toolkits.mplot3d import Axes3Dimport pylabimport mathimport logging as mod_loggingimport srtmimport srtm as mod_srtmimport urllibimport zipfileimport osimport scipy.ioimport seaborn as snsimport matplotlib as mplimport branca.colormap as cmimport foliumfrom scipy.stats import skewnormfrom folium import pluginsimport geemap%matplotlib inlinefrom pyvista import set_plot_theme# Importing auxiliary librariesimport pandas as pdimport matplotlib.pyplot as pltfrom IPython.display import Imageset_plot_theme('document')mpl.style.use('default')%matplotlib inlineimport warningswarnings.simplefilter("ignore")# rayshaderlibrary(rayshader)library(tiff)library(raster)library(geoviz)# chicagooptions(warn=-1)library(ggplot2)library(tidyverse)library(rgeos)library(RColorBrewer)library(maptools)library(plotly)library(scatterplot3d)library(tidyverse)library(RNHANES)library(latticeExtra)# spatial statisticsoptions(warn=-1)library(spatstat)library(raster)%env HV_DOC_HTML=true%load_ext rpy2.ipythonxxxxxxxxxxExplain in your notebook the differences between raster and polygon data:Explain in your notebook the differences between raster and polygon data:
xxxxxxxxxxPolygons are a type of vector data that consist of a series of connected vertices that form a closed shape. They are often used to represent areas such as countries, states, and neighborhoods.Vector data consists of points, lines, and polygons. Each feature is represented by one or more vertices, which are connected by lines to form the feature. On the other hand, Raster data is any pixelated data where each pixel is associated with a specific geographical location. The value of a pixel can be continuous (e.g. elevation) or categorical (e.g. land use). Moreover, raster data is accompanied by spatial information that connects the data to a particular location. Polygons are a type of vector data that consist of a series of connected vertices that form a closed shape. They are often used to represent areas such as countries, states, and neighborhoods.Vector data consists of points, lines, and polygons. Each feature is represented by one or more vertices, which are connected by lines to form the feature. On the other hand, Raster data is any pixelated data where each pixel is associated with a specific geographical location. The value of a pixel can be continuous (e.g. elevation) or categorical (e.g. land use). Moreover, raster data is accompanied by spatial information that connects the data to a particular location.
xxxxxxxxxxDesign a non-interactiveplot of French school districts based on the dataset provided by the instructor. Each district should be represented with a different color. Design a non-interactive plot of French school districts based on the dataset provided by the instructor. Each district should be represented with a different color.
# reading the shapefilefname = "//content//drive//MyDrive//GSP//academies-20160209-shp//academies-20160209.shp"schools = gpd.read_file(fname)schools.head()fig, ax = plt.subplots(1, 1, figsize=(10,10))schools.plot(cmap="OrRd",ax=ax)plt.xlim(left=-5, right = 10)plt.ylim(top=53, bottom = 40)plt.title('French school districts')plt.axis("off");xxxxxxxxxxFind a shapefilerepresenting the entire world, and design a set of maps based on different projections / coordinate reference systems (CRS) (ex.: Mercator, etc.). Find a shapefile representing the entire world, and design a set of maps based on different projections / coordinate reference systems (CRS) (ex.: Mercator, etc.).
# importing data from geopandas datasets representing the earthworld = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))world.crs # original projectionworld.plot(figsize=(15, 10))plt.title('World map from WGS84 (Latitude/Longitude)',fontdict={'fontsize':20})plt.axis("off");world_proj2 = world[(world.name != "Antarctica") & (world.name != "Fr. S. Antarctic Lands")]world_proj2 = world_proj2.to_crs("EPSG:3395") # world.to_crs(epsg=3395) would also workmercator = world_proj2.plot(figsize=(15, 10));mercator.set_title("Mercator Projection of The World Map",fontsize=20)plt.axis("off");#another testing of crs# check thos link for different crs: https://www.nceas.ucsb.edu/sites/default/files/2020-04/OverviewCoordinateReferenceSystems.pdfworld_test = world.set_crs(32733, allow_override=True)world_test = world_test.plot(figsize=(15, 10));world_test.set_title("UTM (South) Projection of The World Map",fontsize=20)plt.axis("off");xxxxxxxxxxWrite downa paragraph (in your notebook only) explaining the concept of CRSWrite down a paragraph (in your notebook only) explaining the concept of CRS
xxxxxxxxxxCoordinate reference system (CRS) is a framework used to precisely measure locations on the surface of the Earth as coordinates. It provides a standardized way of describing locations.Many different CRS are used to describe geographic data.The CRS that is chosen depends on when the data wascollected, the geographic extent of the data, the purpose ofthe dataCoordinate reference system (CRS) is a framework used to precisely measure locations on the surface of the Earth as coordinates. It provides a standardized way of describing locations. Many different CRS are used to describe geographic data. The CRS that is chosen depends on when the data was collected, the geographic extent of the data, the purpose of the data
xxxxxxxxxxDownloada raster dataset from the SRTM website (elevation data from the entire world,+-30 meters in precision). Design a 2D map of the area you selected. Download a raster dataset from the SRTM website (elevation data from the entire world, +-30 meters in precision). Design a 2D map of the area you selected.
fpath = '//content//drive//MyDrive//GSP//srtm_germany_dsm//srtm_germany_dsm.tif'def rasterio_open(f): return rio.open(f)src_image = rasterio_open(fpath)fig, ax = plt.subplots(1, figsize=(8, 8))show(src_image, ax=ax,cmap='viridis')plt.title('Germany raster map')plt.axis("off")plt.show()file = "//content//drive//MyDrive//GSP//HYP_LR_SR_W//HYP_LR_SR_W.tif"src_image = rasterio_open(file)fig, ax = plt.subplots(1, figsize=(12, 12))show(src_image, ax=ax,cmap='viridis')plt.title('World raster map')plt.axis("off")plt.show()dem = rio.open(file)dem_array = dem.read(1).astype('float64')dem_array = dem.read(1).astype('float64')fig, ax = plt.subplots(1, figsize=(12, 12))show(dem_array, cmap='Greys_r', ax=ax)plt.title('World raster map')plt.axis("off")plt.show()elevation_data = srtm.get_data(local_cache_dir="//content//drive//MyDrive//GSP//srtm")print('CGN Airport elevation (meters):', elevation_data.get_elevation(50.8682, 7.1377))xxxxxxxxxximage = elevation_data.get_image((500, 500), (45, 46), (13, 14), 300)xxxxxxxxxxmod_logging.basicConfig(level=mod_logging.DEBUG, format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s')geo_elevation_data = srtm.get_data()xxxxxxxxxximage = geo_elevation_data.get_image((400, 400), (45, 46), (13, 14), 300)image.show()xxxxxxxxxxprint(' Digital Elevation Model of Istra and Trieste')geo_elevation_data.get_image((600, 600), (45, 46), (13, 14), 300)xxxxxxxxxx# load filestr_name<-'H:/CY Tech Year 4/Semester 1/Geospatial/Datasets/finalmap.tif'imported_raster=raster(str_name)elmat = raster_to_matrix(imported_raster)xxxxxxxxxx# create plotelmat %>% sphere_shade(texture = "desert") %>% #add shade add_water(detect_water(elmat), color = "desert") %>% # add water add_shadow(ray_shade(elmat, zscale = 3), 0.5) %>% #add shadow add_shadow(ambient_shade(elmat), 0) %>% #add shadow plot_3d(elmat, zscale = 10, fov = 0, theta = 135, zoom = 1, phi = 45, windowsize = c(1000, 800))Sys.sleep(0.2)render_snapshot()x
elmat %>% sphere_shade(texture = "desert") %>% add_water(detect_water(elmat), color = "desert") %>% add_shadow(ray_shade(elmat, zscale = 3), 0.5) %>% add_shadow(ambient_shade(elmat), 0) %>% plot_3d(elmat, solid = T, water = T, waterdepth = 0, wateralpha = 0.5, watercolor = "lightblue", waterlinecolor = "white", waterlinealpha = 0.5, fov=0,theta=135,zoom=0.75,phi=45, windowsize = c(1000,800))Sys.sleep(0.2)render_snapshot()xxxxxxxxxx## 2.1: Interactive data visualization with Foliumfname = './../Datasets/academies-20160209-shp'french_schools = gpd.read_file(fname)french_schools.head()xxxxxxxxxxrng = np.random.default_rng()french_schools["perc. exam success"] = rng.integers(low = 0, high = 100, size = len(french_schools))xxxxxxxxxxx_map=french_schools.centroid.x.mean()y_map=french_schools.centroid.y.mean()print(x_map,y_map)xxxxxxxxxx#checking color scheme colormap = cm.linear.YlGnBu_09.to_step(data=french_schools['perc. exam success'], method='quant', quantiles=[0,0.1,0.75,0.9,0.98,1])colormapxxxxxxxxxxmap_ = folium.Map(location=[47, 3], zoom_start=5,tiles=None)folium.TileLayer('OpenStreetMap',name="Light Map",control=False).add_to(map_)colormap.caption = "% of exam success"style_function = lambda x: {"weight":0.5, 'color':'black', 'fillColor':colormap(x['properties']['perc. exam success']), 'fillOpacity':0.75}highlight_function = lambda x: {'fillColor': '#000000', 'color':'#000000', 'fillOpacity': 0.50, 'weight': 0.1}FRENCH_SCHOOLS=folium.features.GeoJson( french_schools, style_function=style_function, control=False, highlight_function=highlight_function, tooltip=folium.features.GeoJsonTooltip(fields=['name','perc. exam success'], aliases=['Name','% of exam success'], style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;"), sticky=True,zoom_start=0.1 ) )colormap.add_to(map_)map_.add_child(FRENCH_SCHOOLS)map_xxxxxxxxxxphd = pd.read_csv("./../Datasets/PhD.v3.csv")phd.head()xxxxxxxxxxphd_count = phd.groupby(["etablissement_rec"])["etablissement_rec"].count().reset_index(name = "count")phd_count.head()xxxxxxxxxxopendata = pd.read_csv("./../Datasets/opendata.csv",sep=";")[['uai - identifiant','Libellé','Géolocalisation']]opendata.head()xxxxxxxxxxopendata_phd_count = pd.merge(left=opendata, right=phd_count, left_on='Libellé', right_on='etablissement_rec')opendata_phd_count.head()xxxxxxxxxxCYlocation = list(map(float,str(opendata_phd_count[opendata_phd_count.Libellé=='CY Cergy Paris Université']['Géolocalisation'].values).split()[0][2:-2].split(',')))xxxxxxxxxx#adding a marker on CY Cergy Paris Université locationfolium.Marker(CYlocation, popup='CY Tech').add_to(map_)#adding bubbles to the map based on the count of phds per institutionfor i in range(0,len(opendata_phd_count)): coordinates = opendata_phd_count["Géolocalisation"][i] folium.Circle( location=coordinates.split(','), popup=opendata_phd_count.iloc[i]['Libellé'], radius=float(opendata_phd_count.iloc[i]['count']), color='blue', fill=True, fill_color='blue' ).add_to(map_)map_xxxxxxxxxxnot: maps are sceeenshots from the html generated kepler mapnot: maps are sceeenshots from the html generated kepler map
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx## 2.3 DBSCAN and its application to trajectories clusteringxxxxxxxxxxdef kMedoids(D, k, tmax=100): # determine dimensions of distance matrix D m, n = D.shape np.fill_diagonal(D, math.inf) if k > n: raise Exception('too many medoids') # randomly initialize an array of k medoid indices M = np.arange(n) np.random.shuffle(M) M = np.sort(M[:k]) # create a copy of the array of medoid indices Mnew = np.copy(M) # initialize a dictionary to represent clusters C = {} for t in range(tmax): # determine clusters, i. e. arrays of data indices J = np.argmin(D[:,M], axis=1) for kappa in range(k): C[kappa] = np.where(J==kappa)[0] # update cluster medoids for kappa in range(k): J = np.mean(D[np.ix_(C[kappa],C[kappa])],axis=1) j = np.argmin(J) Mnew[kappa] = C[kappa][j] np.sort(Mnew) # check for convergence if np.array_equal(M, Mnew): break M = np.copy(Mnew) else: # final update of cluster memberships J = np.argmin(D[:,M], axis=1) for kappa in range(k): C[kappa] = np.where(J==kappa)[0] np.fill_diagonal(D, 0) # return results return M, Cxxxxxxxxxx# from kmedoid import kMedoids # kMedoids code is adapted from https://github.com/letiantian/kmedoids# Some visualization stuff, not so importantsns.set()plt.rcParams['figure.figsize'] = (12, 12)xxxxxxxxxx# Utility Functionscolor_lst = plt.rcParams['axes.prop_cycle'].by_key()['color']color_lst.extend(['firebrick', 'olive', 'indigo', 'khaki', 'teal', 'saddlebrown', 'skyblue', 'coral', 'darkorange', 'lime', 'darkorchid', 'dimgray'])def plot_cluster(traj_lst, cluster_lst): ''' Plots given trajectories with a color that is specific for every trajectory's own cluster index. Outlier trajectories which are specified with -1 in `cluster_lst` are plotted dashed with black color ''' cluster_count = np.max(cluster_lst) + 1 for traj, cluster in zip(traj_lst, cluster_lst): if cluster == -1: # Means it it a noisy trajectory, paint it black plt.plot(traj[:, 0], traj[:, 1], c='k', linestyle='dashed') else: plt.plot(traj[:, 0], traj[:, 1], c=color_lst[cluster % len(color_lst)]) plt.show()xxxxxxxxxxdataset_link = '//content//drive//MyDrive//GSP//CVRR_dataset_trajectory_clustering//cross.mat'data_folder = 'data'filename = '%s/cross.mat' % data_folder# Import datasettraj_data = scipy.io.loadmat('//content//drive//MyDrive//GSP//CVRR_dataset_trajectory_clustering//cross.mat')['tracks']traj_lst = []for data_instance in traj_data: traj_lst.append(np.vstack(data_instance[0]).T)xxxxxxxxxx# Plottingfor traj in traj_lst: plt.plot(traj[:, 0], traj[:, 1])xxxxxxxxxx# 2 - Trajectory segmentationdegree_threshold = 5for traj_index, traj in enumerate(traj_lst): hold_index_lst = [] previous_azimuth= 1000 for point_index, point in enumerate(traj[:-1]): next_point = traj[point_index + 1] diff_vector = next_point - point azimuth = (math.degrees(math.atan2(*diff_vector)) + 360) % 360 if abs(azimuth - previous_azimuth) > degree_threshold: hold_index_lst.append(point_index) previous_azimuth = azimuth hold_index_lst.append(traj.shape[0] - 1) # Last point of trajectory is always added traj_lst[traj_index] = traj[hold_index_lst, :]# 3 - Distance matrixdef hausdorff( u, v): d = max(directed_hausdorff(u, v)[0], directed_hausdorff(v, u)[0]) return dtraj_count = len(traj_lst)D = np.zeros((traj_count, traj_count))# This may take a whilefor i in range(traj_count): for j in range(i + 1, traj_count): distance = hausdorff(traj_lst[i], traj_lst[j]) D[i, j] = distance D[j, i] = distancek = 10 # The number of clustersmedoid_center_lst, cluster2index_lst = kMedoids(D, k)cluster_lst = np.empty((traj_count,), dtype=int)for cluster in cluster2index_lst: cluster_lst[cluster2index_lst[cluster]] = cluster plot_cluster(traj_lst, cluster_lst)xxxxxxxxxxmdl = DBSCAN(eps=400, min_samples=10)cluster_lst = mdl.fit_predict(D)plot_cluster(traj_lst, cluster_lst)xxxxxxxxxxDBSCAN Based on a set of points, groups together the points that are close to each other based on a distance measurement and a minimum number of points. It also marks as outliers the points that are in low-density regions. It takes into account 2 main parameters: epsilon which specifies how close points should be to each other to be considered a part of a cluster. It means that if the distance between two points is lower or equal to this value, these points are considered neighbors., and minPoints: the minimum number of points to form a dense region.DBSCAN Based on a set of points, groups together the points that are close to each other based on a distance measurement and a minimum number of points. It also marks as outliers the points that are in low-density regions. It takes into account 2 main parameters: epsilon which specifies how close points should be to each other to be considered a part of a cluster. It means that if the distance between two points is lower or equal to this value, these points are considered neighbors., and minPoints: the minimum number of points to form a dense region.
xxxxxxxxxx# import datadf_spacetime = pd.read_csv('//content//drive//MyDrive//GSP//MPIAB White Stork South African Population Argos.csv')xxxxxxxxxxdf_spacetime.head(2)xxxxxxxxxx#initialise the figure objectfig = plt.figure(figsize=(14,10))#initialise the Map using Basemap. You can manually set the area of the world you want to #be displayed. You can also get various variations of the maps.m = Basemap(projection='ortho', lat_0=10, lon_0=13, resolution='i')m.fillcontinents(color='#191919',lake_color='#000000') # dark grey land, black lakesm.drawmapboundary(fill_color='#000000') # black backgroundm.drawcountries(linewidth=0.1, color="w") # draw countries#extract the coordinates into the variables x and yx, y = m(df_spacetime['location-long'].values, df_spacetime['location-lat'].values)#plot the positions of the birds as they migrate using the scatter plot.#Refer the documentation of Basemap for a better understanding of the various parametersm.scatter(x, y, c="#1292db", lw=0, alpha=1, zorder=5, s = 1)#title for the imageplt.title("Migration of Lesser Black-backed Gulls birds",fontsize=20)#plot the imageplt.show()xxxxxxxxxxset(df_spacetime['tag-local-identifier'])xxxxxxxxxxdef plot_3d(df): #initialise a figure object fig = plt.figure(figsize=(14,10)) #add a 3D object ax = fig.add_subplot(111, projection='3d') groups = df.groupby('tag-local-identifier') # Plot # fig, ax = plt.subplots() # ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling # for name, group in groups: # ax.scatter(df['location-lat'], df['location-long'],df['event-id'], label=name, lw=0, alpha=1, zorder=5,c=df['tag-local-identifier']) # ax.legend() #scatter plot ax.scatter(df['location-lat'], df['location-long'],df['event-id'], c="#1292db", lw=0, alpha=1, zorder=5, s = 3) ax.set_xlabel('Latitude') ax.set_ylabel('Longitude') ax.set_zlabel('Time of Year') ax.set_title("Migration of Lesser Black-backed Gulls in 2001",fontsize=20); plt.show() # d = pd.read_csv('migration_original.csv')#91732 is the id of the bird whose trajectory we are interested in.#feel free to change the id of the birdk = df_spacetime.loc[df_spacetime['tag-local-identifier'] == 20527]#normalise the time axis#use the event-id to determine the time stamp#Note: event-id is in increasing order between a range of numbers,#so we can easily normalise for a given bird#In the next post, I will plot using time-stamps#and that will make things more cleark['event-id'] = (k['event-id'] - k['event-id'].mean())/k['event-id'].std(ddof=0)plot_3d(k)xxxxxxxxxx# load filechicago_Df <- read.csv('H:/CY Tech Year 4/Semester 1/Geospatial/Datasets/Gun_Crimes_Heat_Map.csv')head(chicago_Df)xxxxxxxxxxchicago_Df$Date <- format(as.POSIXct(chicago_Df$Date,format='%m/%d/%Y %H:%M:%S'),format='%m/%d/%Y')xxxxxxxxxxread.city_boundary <- function(file_name){ city <- readShapePoly('./../Datasets/cityboundary.shp') city <- fortify(gSimplify(city, tol=100), region='OBJECTID') rename(city, c(x="long", y="lat"))}xxxxxxxxxxcity <- read.city_boundary("H:/CY Tech Year 4/Semester 1/Geospatial/Datasets/cityboundary.shp")xxxxxxxxxxhead(city)xxxxxxxxxxoptions(scipen=999)ggplot(chicago_Df, aes(x = Longitude, y = Latitude))+ggtitle("Plot of length \n by dose") +geom_density2d(data = chicago_Df,aes(x = X.Coordinate, y = Y.Coordinate), size = 0.8,colour='red')+ geom_path(data=city, aes(x, y, group=group), size=.2, colour='black') + scale_alpha(range = c(0.05,0.5)) + scale_alpha_continuous(limits=c(0,0.2),breaks=seq(0,0.2,by=0.025))+ facet_wrap(~Primary.Type, scales = "free") + labs(x = "Longitude", y = "Latitude",title = paste("", "", sep = "\n"), hjust = .5) + guides(colour = FALSE, alpha = FALSE) + theme_bw() + theme( axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks = element_blank())+ labs(title = "Chicago heatmap contours for each crime primary type")ggplot(chicago_Df, aes(x = Longitude, y = Latitude)) +stat_density_2d(data = chicago_Df,aes(x = X.Coordinate, y = Y.Coordinate,fill = factor(stat(level))),geom='polygon')+# scale_alpha(range = c(0.05,0.5)) + scale_fill_manual(values = c("deepskyblue","deepskyblue1","deepskyblue2","deepskyblue3","dodgerblue3","dodgerblue4","blue2","blue3","blue4"))+ geom_path(data=city, aes(x, y, group=group), size=.2, colour='black') + # scale_fill_gradientn(colours = rev(rainbow(7))+ facet_wrap(~Primary.Type) + labs(x = "Longitude", y = "Latitude",title = paste("", "", sep = "\n"), hjust = .5) + theme_dark() + theme( axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks = element_blank(), legend.position = "none")+ labs(title = "Chicago heatmap for each crime primary type")ggdiamonds <- ggplot(chicago_Df, aes(x = Longitude, y = Latitude)) +stat_density_2d(data = chicago_Df,geom='polygon' , aes(x = X.Coordinate, y = Y.Coordinate,fill = stat(nlevel)), n = 200, bins = 50)+ scale_fill_viridis_c(option = "A")+ geom_path(data=city, aes(x, y, group=group), size=.6,colour='black') + facet_wrap(~Primary.Type) + labs(x = "Longitude", y = "Latitude",title = paste("", "", sep = "\n"), hjust = .5) + theme_bw() + theme( axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks = element_blank(), legend.position = "none")plot_gg(ggdiamonds,multicore = TRUE,width=5,height=5,scale=250,windowsize=c(1400,866), zoom = 0.55,theta=-30, phi = 30)Sys.sleep(0.2)render_snapshot()ggdiamonds <- ggplot(chicago_Df, aes(x = Longitude, y = Latitude)) +stat_density_2d(data = chicago_Df,geom='polygon' , aes(x = X.Coordinate, y = Y.Coordinate,fill = stat(nlevel)), n = 200, bins = 50)+ scale_fill_viridis_c(option = "A")+ geom_path(data=city, aes(x, y, group=group), size=.2, colour='black') + facet_wrap(~Primary.Type) + labs(x = "Longitude", y = "Latitude",title = paste("", "", sep = "\n"), hjust = .5) + theme_bw() + theme( axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks = element_blank(), legend.position = "none")plot_gg(ggdiamonds,multicore = TRUE,width=5,height=5,scale=250,windowsize=c(1400,866), zoom = 0.55, phi = 10)Sys.sleep(0.2)render_snapshot()PT_date_count <- chicago_Df %>% select(Year,Primary.Type) %>% group_by(Year,Primary.Type) %>% summarise(n = n())PT_date_count<-na.omit(PT_date_count)head(PT_date_count)cols <- c("deepskyblue","red","blue","green","yellow","black")with(PT_date_count, scatterplot3d(Primary.Type, Year, n,type="h", color=cols[as.numeric(PT_date_count$Primary.Type)], ,main="Chicago Crime Types Number of Evolution Over The Years",xlab="Assault Type",zlab="Number of Crimes"))legend("topleft", legend = levels(factor(PT_date_count$Primary.Type)), col = c("deepskyblue","red","blue","green","yellow","black"), pch=16, cex=0.8)# pip install geemap[lidar] open3dxxxxxxxxxxDownload a [sample LiDAR dataset](https://drive.google.com/file/d/1H_X1190vL63BoFYa_cVBDxtIa8rG-Usb/view?usp=sharing) from Google Drive. The zip file is 52.1 MB and the uncompressed LAS file is 109 MB.Download a sample LiDAR dataset from Google Drive. The zip file is 52.1 MB and the uncompressed LAS file is 109 MB.
url = ( 'https://drive.google.com/file/d/1H_X1190vL63BoFYa_cVBDxtIa8rG-Usb/view?usp=sharing')filename = 'madison.las'if not os.path.exists(filename): geemap.download_file(url, 'madison.zip', unzip=True)xxxxxxxxxxRead the LiDAR dataRead the LiDAR data
las = geemap.read_lidar(filename)xxxxxxxxxxThe LAS header.The LAS header.
las.headerxxxxxxxxxxThe number of points.The number of points.
las.header.point_countxxxxxxxxxxThe list of features.The list of features.
xxxxxxxxxxlist(las.point_format.dimension_names)xxxxxxxxxxInspect data.Inspect data.
xxxxxxxxxxlas.Xxxxxxxxxxxlas.Yxxxxxxxxxxlas.Zxxxxxxxxxxlas.intensityxxxxxxxxxxVisualize LiDAR data using the [open3d](http://www.open3d.org) backend. Visualize LiDAR data using the open3d backend.
xxxxxxxxxxgeemap.view_lidar(filename, backend='open3d')xxxxxxxxxx<b><div align='center'>Gif of the pop up window output by the code chunk above</div></b>xxxxxxxxxx
xxxxxxxxxx## 4.1: Introduction to point pattern statisticsxxxxxxxxxx#1disc10 <- disc(10) # 2set.seed(123)p_cluster <- rThomas(kappa = 0.35, scale = 1, mu = 3, win = disc10)plot(p_cluster)#3quadrat.test(p_cluster, alternative = "clustered")xxxxxxxxxx#4 set.seed(123)p_regular <- rStrauss(beta = 2.9, gamma = 0.025, R = .5, W = disc10)plot(p_regular)#5quadrat.test(p_regular, alternative = "regular")xxxxxxxxxx#6 # Create a disc of radius 10disc10 <- disc(10)# Compute the rate as count divided by arealambda <- 500 / area(disc(10))# Create a point pattern objectp_poisson <- rpoispp(lambda = lambda, win = disc(10))# Create a point pattern objectp_regular <- rStrauss(beta = 2.9, gamma = 0.025, R = .5, W = disc10)xxxxxxxxxx#7# Calc nearest-neighbor distances for Poisson point datannd_poisson <- nndist(p_poisson)#8# Draw a histogram of nearest-neighbor distanceshist(nnd_poisson)#9# Estimate G(r)G_poisson <- Gest(p_poisson)xxxxxxxxxx# 10xxxxxxxxxx\begin{align}G(r) = 1 - exp( - lambda * pi * r ^ 2)\end{align}xxxxxxxxxxG(r) is the probability of a point having a nearest neighbor within a distance r (the corresponding cdf of the pdf of the NN distances). G(r) is the probability of a point having a nearest neighbor within a distance r (the corresponding cdf of the pdf of the NN distances).
xxxxxxxxxxTo calculate G(r) for a pattern, look at each event in the pattern and find the distance to the nearest event, we do this for every event, and plot a histogram to give an estimate of the probability density function of the nearest neighbor distribution. The corresponding cumulative distribution function is the G(r) we are searching for.To calculate G(r) for a pattern, look at each event in the pattern and find the distance to the nearest event, we do this for every event, and plot a histogram to give an estimate of the probability density function of the nearest neighbor distribution. The corresponding cumulative distribution function is the G(r) we are searching for.
xxxxxxxxxxBorder correction in spatial statistics involves correcting for the effects of the boundaries of the region on the calculated value of G(r). This is important because the presence of boundaries can influence the distribution of points within the region, leading to inaccurate results. There are several methods that can be used to correct for these effects, including the use of periodic boundary conditions, which involve replicating the region in all dimensions to create an infinite array of identical copies. This can help to mitigate the influence of the boundaries on the calculated value of G(r).Border correction in spatial statistics involves correcting for the effects of the boundaries of the region on the calculated value of G(r). This is important because the presence of boundaries can influence the distribution of points within the region, leading to inaccurate results. There are several methods that can be used to correct for these effects, including the use of periodic boundary conditions, which involve replicating the region in all dimensions to create an infinite array of identical copies. This can help to mitigate the influence of the boundaries on the calculated value of G(r).
xxxxxxxxxx#12# Plot G(r) vs. rplot(G_poisson)xxxxxxxxxx#13# Repeat for regular point datannd_regular <- nndist(p_regular)hist(nnd_regular)G_regular <- Gest(p_regular)plot(G_regular)xxxxxxxxxx#13 # Estimate the K-function for the Poisson pointsK_poisson <- Kest(p_poisson, correction = "border")# The default plot shows quadratic growthplot(K_poisson, . ~ r)xxxxxxxxxx14 Explain the role of the K-function and the envelope, and generally speaking how Ripley’s K work14 Explain the role of the K-function and the envelope, and generally speaking how Ripley’s K work
xxxxxxxxxx\begin{align}K(r) = pi * r ^ 2 - \text{the area of a circle of with radius r}\end{align}xxxxxxxxxxRipley's K-function is used to compare a given point distribution with a random distribution where the point distribution under investigation is tested against the null hypothesis that the points are distributed randomly and independently. K(r) is defined as the expected number of points within a distance of a point of the process, scaled by the intensity.Ripley's K-function is used to compare a given point distribution with a random distribution where the point distribution under investigation is tested against the null hypothesis that the points are distributed randomly and independently. K(r) is defined as the expected number of points within a distance of a point of the process, scaled by the intensity.
xxxxxxxxxxAn envelope is a tool to assess uncertanities on K-function estimates by randomly sampling points from a uniform Poisson process in the area and computing the K-function of the simulated data. Repeat this process 99 times, and take the minimum and maximum value of K over each of the distance values. If the K-function from the data goes above the top of the envelope then we have evidence for clustering. If the K-function goes below the envelope then there is evidence for an inhibitory process causing points to be spaced out.An envelope is a tool to assess uncertanities on K-function estimates by randomly sampling points from a uniform Poisson process in the area and computing the K-function of the simulated data. Repeat this process 99 times, and take the minimum and maximum value of K over each of the distance values. If the K-function from the data goes above the top of the envelope then we have evidence for clustering. If the K-function goes below the envelope then there is evidence for an inhibitory process causing points to be spaced out.
xxxxxxxxxx#15# Subtract pi * r ^ 2 from the Y-axis and plotplot(K_poisson, . - pi * r ^ 2 ~ r)xxxxxxxxxxGenerally speaking, if the data was generated by a Poisson process, then the line should be close to zero for all values of r, so we subtract pi*r^2 verify this fact. In the case above, we see that the red line stayed at zero for all values of r.Generally speaking, if the data was generated by a Poisson process, then the line should be close to zero for all values of r, so we subtract pi*r^2 verify this fact. In the case above, we see that the red line stayed at zero for all values of r.
xxxxxxxxxx#16 , 17# Compute envelopes of K under random locationsK_cluster_env <- envelope(p_cluster, Kest, correction = "border")xxxxxxxxxx#18 (1)# Insert the full formula to plot K minus pi * r^2plot(K_cluster_env, .- pi * r ^ 2 ~ r)xxxxxxxxxx#18 (2)# Repeat for regular dataK_regular_env <- envelope(p_regular, Kest, correction = "border")plot(K_regular_env, .- pi * r ^ 2 ~ r)xxxxxxxxxxdata(bei)xxxxxxxxxxplot(bei)xxxxxxxxxxdata(bei)xxxxxxxxxxplot(bei.extra$elev)plot(bei, add = TRUE,cex = 0.3)xxxxxxxxxxquadrat.test(bei, alternative = "clustered")xxxxxxxxxxFrom the quadrate test above, we can conclude that the tree pattern is clustered as the p-value < 2.2e-16, so we can reject the null hypothesis that the data is coming from a uniform Poisson point process.From the quadrate test above, we can conclude that the tree pattern is clustered as the p-value < 2.2e-16, so we can reject the null hypothesis that the data is coming from a uniform Poisson point process.
xxxxxxxxxx## 4.2: Point pattern statistics with Preston Crimexxxxxxxxxxpreston_crime_df <- read.table(".//..//Datasets//preston crime-spatstat.csv",sep=';',header=FALSE, skip=1)[,-c(1)]colnames(preston_crime_df) <- c('x','y','mark')summary(preston_crime_df)xxxxxxxxxxpreston_osm <- read.table(".//..//Datasets//osm_preston_gray.csv", sep=';', skip=1)[,-c(1)]colnames(preston_osm) <- c("osm_preston_gray.1","osm_preston_gray.2","osm_preston_gray.3","osm_preston_gray.4")summary(preston_osm)xxxxxxxxxxfilename <- file.choose()preston_crime <- readRDS(filename)# preston_osm <- readRDS("osm_preston.rds")# Get some summary information on the datasetsummary(preston_crime)xxxxxxxxxxfilename <- file.choose()preston_osm <- readRDS(filename)# Get some summary information on the datasetsummary(preston_osm)xxxxxxxxxx# Get a table of markstable(marks(preston_crime))xxxxxxxxxx# Define a function to create a mappreston_map <- function(cols = c("green","red"), cex = c(1, 1), pch = c(1, 1)) { plotRGB(preston_osm) # from the raster package plot(preston_crime, cols = cols, pch = pch, cex = cex, add = TRUE, show.window = TRUE)}# Draw the map with colors, sizes and plot characterpreston_map( cols = c("black", "red"), cex = c(0.5, 1), pch = c(19,19))xxxxxxxxxx# 1# Use the split function to show the two point patternscrime_splits <- split(preston_crime)# 2# Plot the split crimeplot(crime_splits)xxxxxxxxxx#3# Compute the densities of both sets of pointscrime_densities <- density(crime_splits)#4# Calc the violent density divided by the sum of bothfrac_violent_crime_density <- crime_densities[["Violent crime"]] / (crime_densities[["Non-violent crime"]] + crime_densities[["Violent crime"]])#5# Plot the density of the fraction of violent crimeplot(frac_violent_crime_density)xxxxxxxxxxlondon_ref <- readRDS(".//..//Datasets//london_eu.rds")xxxxxxxxxxsummary(london_ref)xxxxxxxxxx# Which boroughs voted to "Remain"?london_ref$NAME[london_ref$Leave < london_ref$Remain]xxxxxxxxxx#1# Plot a map of the percentage that voted "Remain"spplot(london_ref, zcol = "Pct_Remain")xxxxxxxxxx#2# Use the cartogram and rgeos packageslibrary(cartogram)library(rgeos)library(spdep)# Make a cartogram, scaling the area to the electoratecarto_ref <- cartogram_cont(london_ref, "Electorate")plot(carto_ref)xxxxxxxxxx#3# Make neighbor listborough_nb <- poly2nb(london_ref)print(borough_nb)xxxxxxxxxx#4# Get center points of each boroughborough_centers <- coordinates(london_ref)#5# Show the connectionsplot(london_ref); plot(borough_nb, borough_centers, add = TRUE)xxxxxxxxxx#6# Map % Remainspplot(london_ref, zcol = "Pct_Remain")xxxxxxxxxx#7# Run a Moran I MC test on % Remainmoran.mc( london_ref$Pct_Remain, nb2listw(borough_nb), 999)xxxxxxxxxx8) Explain how the Moran’s I distribution is made:<br>The Moran's I distribution is created by repeating the steps of a permutation test for the Moran's I statistic a large number of times. Specifically, the distribution is created by permuting the values of the dependent variable among the observations, calculating the Moran's I statistic for each permutation, and storing the resulting values. This creates a distribution of possible values of the Moran's I statistic under the null hypothesis of no spatial autocorrelation.8) Explain how the Moran’s I distribution is made:
The Moran's I distribution is created by repeating the steps of a permutation test for the Moran's I statistic a large number of times. Specifically, the distribution is created by permuting the values of the dependent variable among the observations, calculating the Moran's I statistic for each permutation, and storing the resulting values. This creates a distribution of possible values of the Moran's I statistic under the null hypothesis of no spatial autocorrelation.
xxxxxxxxxxlondon <- readRDS(".//..//Datasets//london_2017_2.rds")xxxxxxxxxx#1# Map the OBServed number of flu reportsspplot(london, "Flu_OBS")xxxxxxxxxx#2# Compute and print the overall incidence of flur <- sum(london$Flu_OBS) / sum(london$TOTAL_POP)rxxxxxxxxxx#3# Calculate the expected number for each boroughlondon$Flu_EXP <- london$TOTAL_POP * rxxxxxxxxxx#4# Calculate the ratio of OBServed to EXPectedlondon$Flu_SMR <- london$Flu_OBS / london$Flu_EXPxxxxxxxxxx#5# Map the SMRspplot(london, "Flu_SMR")xxxxxxxxxxSMR corresponds to Standardized Morbidity Ratio, and it is the number of cases per person, but scaled by the overall incidence so that the expected number is 1.SMR corresponds to Standardized Morbidity Ratio, and it is the number of cases per person, but scaled by the overall incidence so that the expected number is 1.
#1# Fit a poisson GLM.model_flu <- glm( Flu_OBS ~ HealthDeprivation, offset = log(TOTAL_POP), data = london, family = poisson)#2 # Put residuals into the spatial data.london$Flu_Resid <- residuals(model_flu)# Map the residuals using spplotspplot(london, "Flu_Resid")xxxxxxxxxxThe yellow borough had many more cases of flu than expected; the black borough had many fewer cases.The yellow borough had many more cases of flu than expected; the black borough had many fewer cases.
#3 # Compute the neighborhood structure and test spatial correlation of the residuals.library(spdep)borough_nb <- poly2nb(london)moran.mc(london$Flu_Resid, listw = nb2listw(borough_nb), nsim = 999)#4# Use R2BayesXlibrary(R2BayesX)# Fit a classic GLMmodel_flu <- glm(Flu_OBS ~ HealthDeprivation, offset = log(TOTAL_POP), data = london, family = poisson)summary(model_flu)#5# Fit a Bayesian GLMbayes_flu <- bayesx(Flu_OBS ~ HealthDeprivation, offset = log(london$TOTAL_POP), family = "poisson", data = data.frame(london), control = bayesx.control(seed = 17610407)) # Summarize it summary(bayes_flu)#6# Look at the samples from the Bayesian modelplot(samples(bayes_flu))xxxxxxxxxx 7) Explain how the density of the healthDeprivation parameter is built andwhat it means:<br>The density of the healthDeprivation parameter is called the posterior distribution which is the probability distribution of the healthDeprivation parameter given the data. 7) Explain how the density of the healthDeprivation parameter is built and
what it means:
The density of the healthDeprivation parameter is called the posterior distribution which is the probability distribution of the healthDeprivation parameter given the data.
#8# Compute adjacency objectsborough_nb <- poly2nb(london)borough_gra <- nb2gra(borough_nb)#9 # Fit spatial modelflu_spatial <- bayesx( Flu_OBS ~ HealthDeprivation + sx(i, bs = "spatial", map = borough_gra), offset = log(london$TOTAL_POP), family = "poisson", data = data.frame(london), control = bayesx.control(seed = 17610407))# Summarize the modelsummary(flu_spatial)xxxxxxxxxx10) by contrast with the spatial model, now we can see the Smooth terms variances.10) by contrast with the spatial model, now we can see the Smooth terms variances.
#11# Map the fitted spatial term onlylondon$spatial <- fitted(flu_spatial, term = "sx(i):mrf")[, "Mean"]spplot(london, zcol = "spatial")#12# Map the residualslondon$spatial_resid <- residuals(flu_spatial)[, "mu"]spplot(london, zcol = "spatial_resid")#13# Test residuals for spatial correlationmoran.mc(london$spatial_resid, nb2listw(borough_nb), 999)xxxxxxxxxxFrom the test above we fail to reject the null hypothesis that there is no spatial autocorrelation in the residuals, and that the model is correct, since the residuals have no spatial correlation.From the test above we fail to reject the null hypothesis that there is no spatial autocorrelation in the residuals, and that the model is correct, since the residuals have no spatial correlation.
xxxxxxxxxxnote: 3d visualizations in this section are screenshots of the pop up window output by the codenote: 3d visualizations in this section are screenshots of the pop up window output by the code
xxxxxxxxxxImporting GemPyImporting GemPy
# Setting optionsnp.random.seed(1515)pd.set_option('precision', 2)xxxxxxxxxx## Importing and creating a set of input dataThe data used for the construction of a model in GemPy is stored inPython objects. The main data classes are::: - Surface_points - Orientations - Grid - Surfaces - Series - Additional data - FaultsWe will see each of this class in further detail in the future.Most of data can also be generated from raw data that comes in the formof CSV-files (CSV = comma-separated values). Such files might beattained by exporting model data from a different program such asGeoModeller or by simply creating it in spreadsheet software such asMicrosoft Excel or LibreOffice Calc.In this tutorial, all input data is created by importing such CSV-files.These exemplary files can be found in the ``input_data`` folder in theroot folder of GemPy. The data comprises $x$-, $y$- and$z$-positional values for all surface points and orientationmeasurements. For the latter, poles, azimuth and polarity areadditionally included. Surface points are furthermore assigned aformation. This might be a lithological unit such as "Sandstone" or astructural feature such as "Main Fault". It is decisive to rememberthat, in GemPy, interface position points mark the **bottom** of alayer. If such points are needed to resemble a top of a formation (e.g.when modeling an intrusion), this can be achieved by defining arespectively inverted orientation measurement.As we generate our ``Data`` from CSV-files, we also have to define ourmodel's real extent in $x$, $y$ and $z$, as well asdeclare a desired resolution for each axis. This resolution will in turndetermine the number of voxels used during modeling. Here, we rely on amedium resolution of 50x50x50, amounting to 125,000 voxels. The modelextent should be chosen in a way that it contains all relevant data in arepresentative space. As our model voxels are not cubes, but prisms, theresolution can take a different shape than the extent. We don'trecommend going much higher than 100 cells in every direction (1,000,000voxels), as higher resolutions will become increasingly expensive tocompute.The data used for the construction of a model in GemPy is stored in Python objects. The main data classes are:
::
- Surface_points
- Orientations
- Grid
- Surfaces
- Series
- Additional data
- FaultsWe will see each of this class in further detail in the future.
Most of data can also be generated from raw data that comes in the form of CSV-files (CSV = comma-separated values). Such files might be attained by exporting model data from a different program such as GeoModeller or by simply creating it in spreadsheet software such as Microsoft Excel or LibreOffice Calc.
In this tutorial, all input data is created by importing such CSV-files.
These exemplary files can be found in the input_data folder in the
root folder of GemPy. The data comprises -, - and
-positional values for all surface points and orientation
measurements. For the latter, poles, azimuth and polarity are
additionally included. Surface points are furthermore assigned a
formation. This might be a lithological unit such as "Sandstone" or a
structural feature such as "Main Fault". It is decisive to remember
that, in GemPy, interface position points mark the bottom of a
layer. If such points are needed to resemble a top of a formation (e.g.
when modeling an intrusion), this can be achieved by defining a
respectively inverted orientation measurement.
As we generate our Data from CSV-files, we also have to define our
model's real extent in , and , as well as
declare a desired resolution for each axis. This resolution will in turn
determine the number of voxels used during modeling. Here, we rely on a
medium resolution of 50x50x50, amounting to 125,000 voxels. The model
extent should be chosen in a way that it contains all relevant data in a
representative space. As our model voxels are not cubes, but prisms, the
resolution can take a different shape than the extent. We don't
recommend going much higher than 100 cells in every direction (1,000,000
voxels), as higher resolutions will become increasingly expensive to
compute.
xxxxxxxxxxgeo_model = gp.create_model('Tutorial_ch1_1_Basics')xxxxxxxxxxdata_path = 'https://raw.githubusercontent.com/cgre-aachen/gempy_data/master/'# Importing the data from CSV-files and setting extent and resolutiongp.init_data(geo_model, [0, 2000., 0, 2000., 0, 750.], [50, 50, 50], path_o=data_path + "/data/input_data/getting_started/" "simple_fault_model_orientations.csv", path_i=data_path + "/data/input_data/getting_started/" "simple_fault_model_points.csv", default_values=True)xxxxxxxxxxgeo_model.surfacesxxxxxxxxxxThe input data can then be listed using the command ``get_data``. Notethat the order of formations and respective allocation to series isstill completely arbitrary. We will fix this in the following.The input data can then be listed using the command get_data. Note
that the order of formations and respective allocation to series is
still completely arbitrary. We will fix this in the following.
xxxxxxxxxxgp.get_data(geo_model, 'surface_points').head()xxxxxxxxxxgp.get_data(geo_model, 'orientations').head()xxxxxxxxxx### Declaring the sequential order of geological formationsgeo_model.surfacesgp.map_stack_to_surfaces(geo_model, {"Fault_Series": 'Main_Fault', "Strat_Series": ('Sandstone_2', 'Siltstone', 'Shale', 'Sandstone_1', 'basement')}, remove_unused_series=True)geo_model.surfacesgeo_model.stackgeo_model.set_is_fault(['Fault_Series'])geo_model.faults.faults_relations_dfgeo_model.faultsgeo_model.faults.faults_relations_dfxxxxxxxxxx### Returning information from our input dataOur model input data, here named "*geo\_model*", contains all theinformation that is essential for the construction of our model. You canaccess different types of information by using ``gp.get_data`` or simplyby accessiong the atrribues.We can, for example, return the coordinates of our modeling grid via:Our model input data, here named "geo_model", contains all the
information that is essential for the construction of our model. You can
access different types of information by using gp.get_data or simply
by accessiong the atrribues.
We can, for example, return the coordinates of our modeling grid via:
geo_model.gridxxxxxxxxxxAs mentioned before, GemPy's core algorithm is based on interpolation oftwo types of data: - surface\_points and - orientation measurements(if you want to know more on how this this interpolation algorithmworks, checkout our paper: https://www.geosci-model-dev.net/12/1/2019/gmd-12-1-2019.pdf).We introduced the function ``get\_data`` above. You can also specify whichkind of data you want to call, by declaring the string attribute"*dtype*" to be either ``'surface_points'`` (interfaces) or ``'orientations'``\ .### Interfaces Dataframe:As mentioned before, GemPy's core algorithm is based on interpolation of two types of data: - surface_points and - orientation measurements
(if you want to know more on how this this interpolation algorithm works, checkout our paper: https://www.geosci-model-dev.net/12/1/2019/gmd-12-1-2019.pdf).
We introduced the function get\_data above. You can also specify which
kind of data you want to call, by declaring the string attribute
"dtype" to be either 'surface_points' (interfaces) or 'orientations'\ .
xxxxxxxxxxgp.get_data(geo_model, 'surface_points').head()xxxxxxxxxxgp.get_data(geo_model, 'orientations')xxxxxxxxxxNotice that now all **surfaces** have been assigned to a **series** andare displayed in the correct order (from young to old).### Visualizing input dataWe can also visualize our input data. This might for example be usefulto check if all points and measurements are defined the way we want themto. Using the function ``plot_data``\ , we attain a 2D projection of ourdata points onto a plane of chosen *direction* (we can choose thisattribute to be either $x$, $y$ or $z$).Notice that now all surfaces have been assigned to a series and are displayed in the correct order (from young to old).
We can also visualize our input data. This might for example be useful
to check if all points and measurements are defined the way we want them
to. Using the function plot_data\ , we attain a 2D projection of our
data points onto a plane of chosen direction (we can choose this
attribute to be either , or ).
xxxxxxxxxxplot = gp.plot_2d(geo_model, show_lith=False, show_boundaries=False)plt.show()xxxxxxxxxxUsing ``plot_data_3D``\ , we can also visualize this data in 3D. Note thatdirect 3D visualization in GemPy requires `the VisualizationToolkit <https://www.vtk.org/>`__ (VTK) to be installed.All 3D plots in GemPy are interactive. This means that we can dragand drop any data point and measurement. The perpendicular axis views inVTK are particularly useful to move points solely on a desired 2D plane.Any changes will then be stored permanently in the "InputData"dataframe. If we want to reset our data points, we will then need toreload our original input data.Executing the cell below will open a new window with a 3D interactiveplot of our data.Using plot_data_3D\ , we can also visualize this data in 3D. Note that
direct 3D visualization in GemPy requires the Visualization
Toolkit <https://www.vtk.org/>__ (VTK) to be installed.
All 3D plots in GemPy are interactive. This means that we can drag and drop any data point and measurement. The perpendicular axis views in VTK are particularly useful to move points solely on a desired 2D plane. Any changes will then be stored permanently in the "InputData" dataframe. If we want to reset our data points, we will then need to reload our original input data.
Executing the cell below will open a new window with a 3D interactive plot of our data.
xxxxxxxxxxgpv = gp.plot_3d(geo_model, image=False, plotter_type='basic')xxxxxxxxxx### Model generationOnce we have made sure that we have defined all our primary informationas desired in our object ``DataManagement.InputData`` (named``geo_data`` in these tutorials), we can continue with the next steptowards creating our geological model: preparing the input data forinterpolation.This is done by generating an ``InterpolatorData`` object (named``interp_data`` in these tutorials) from our ``InputData`` object viathe following function:Once we have made sure that we have defined all our primary information
as desired in our object DataManagement.InputData (named
geo_data in these tutorials), we can continue with the next step
towards creating our geological model: preparing the input data for
interpolation.
This is done by generating an InterpolatorData object (named
interp_data in these tutorials) from our InputData object via
the following function:
xxxxxxxxxxgp.set_interpolator(geo_model, compile_theano=True, theano_optimizer='fast_compile', )xxxxxxxxxxThis function rescales the extent and coordinates of the original data(and store it in the attribute ``geo_data_res`` which behaves as a usual``InputData`` object) and adds mathematical parameters that are neededfor conducting the interpolation. The computation of this step may takea while, as it also compiles a theano function which is required for themodel computation. However, should this not be needed, we can skip it bydeclaring ``compile_theano = False`` in the function.Furthermore, this preparation process includes an assignment of numbersto each formation. Note that GemPy's always creates a default basementformation as the last formation number. Afterwards, numbers areallocated from youngest to oldest as defined by the sequence of seriesand formations. On the property ``formations`` on our interpolationdata, we can find out which number has been assigned to which formation:This function rescales the extent and coordinates of the original data
(and store it in the attribute geo_data_res which behaves as a usual
InputData object) and adds mathematical parameters that are needed
for conducting the interpolation. The computation of this step may take
a while, as it also compiles a theano function which is required for the
model computation. However, should this not be needed, we can skip it by
declaring compile_theano = False in the function.
Furthermore, this preparation process includes an assignment of numbers
to each formation. Note that GemPy's always creates a default basement
formation as the last formation number. Afterwards, numbers are
allocated from youngest to oldest as defined by the sequence of series
and formations. On the property formations on our interpolation
data, we can find out which number has been assigned to which formation:
xxxxxxxxxxThe parameters used for the interpolation can be returned using thefunction ``get_kriging_parameters``. These are generated automaticallyfrom the original data, but can be changed if needed. However, usersshould be careful doing so, if they do not fully understand theirsignificance.The parameters used for the interpolation can be returned using the
function get_kriging_parameters. These are generated automatically
from the original data, but can be changed if needed. However, users
should be careful doing so, if they do not fully understand their
significance.
xxxxxxxxxxgp.get_data(geo_model, 'kriging')xxxxxxxxxxAt this point, we have all we need to compute our full model via``compute_model``. By default, this will return two separate solutionsin the form of arrays. The first gives information on the lithologicalformations, the second on the fault network in the model. These arraysconsist of two subarrays as entries each:1. Lithology block model solution: - Entry [0]: This array shows what kind of lithological formation is found in each voxel, as indicated by a respective formation\_number. - Entry [1]: Potential field array that represents the orientation of lithological units and layers in the block model.2. Fault network block model solution: - Entry [0]: Array in which all fault-separated areas of the model are represented by a distinct number contained in each voxel. - Entry [1]: Potential field array related to the fault network in the block model.Below, we illustrate these different model solutions and how they can beused.At this point, we have all we need to compute our full model via
compute_model. By default, this will return two separate solutions
in the form of arrays. The first gives information on the lithological
formations, the second on the fault network in the model. These arrays
consist of two subarrays as entries each:
Lithology block model solution:
Fault network block model solution:
Below, we illustrate these different model solutions and how they can be used.
xxxxxxxxxxsol = gp.compute_model(geo_model)xxxxxxxxxxsolxxxxxxxxxxgeo_model.solutionsxxxxxxxxxx### Direct model visualization in GemPyModel solutions can be easily visualized in 2D sections in GemPydirectly. Let's take a look at our lithology block:Model solutions can be easily visualized in 2D sections in GemPy directly. Let's take a look at our lithology block:
xxxxxxxxxxgp.plot_2d(geo_model, show_data=True)plt.show()xxxxxxxxxxWith ``cell_number=25`` and remembering that we defined our resolutionto be 50 cells in each direction, we have chosen a section going throughthe middle of our block. We have moved 25 cells in ``direction='y'``,the plot thus depicts a plane parallel to the $x$- and$y$-axes. Setting ``plot_data=True``, we could plot original datatogether with the results. Changing the values for ``cell_number`` and``direction``, we can move through our 3D block model and explore it bylooking at different 2D planes.We can do the same with out lithological scalar-field solution:With cell_number=25 and remembering that we defined our resolution
to be 50 cells in each direction, we have chosen a section going through
the middle of our block. We have moved 25 cells in direction='y',
the plot thus depicts a plane parallel to the - and
-axes. Setting plot_data=True, we could plot original data
together with the results. Changing the values for cell_number and
direction, we can move through our 3D block model and explore it by
looking at different 2D planes.
We can do the same with out lithological scalar-field solution:
xxxxxxxxxxgp.plot_2d(geo_model, show_data=False, show_scalar=True, show_lith=False)plt.show()xxxxxxxxxxgp.plot_2d(geo_model, series_n=1, show_data=False, show_scalar=True, show_lith=False)plt.show()xxxxxxxxxxThis illustrates well the fold-related deformation of the stratigraphy,as well as the way the layers are influenced by the fault.The fault network modeling solutions can be visualized in the same way:This illustrates well the fold-related deformation of the stratigraphy, as well as the way the layers are influenced by the fault.
The fault network modeling solutions can be visualized in the same way:
xxxxxxxxxxgeo_model.solutions.scalar_field_at_surface_pointsxxxxxxxxxxgp.plot_2d(geo_model, show_block=True, show_lith=False)plt.show()xxxxxxxxxxgp.plot_2d(geo_model, series_n=1, show_block=True, show_lith=False)plt.show()xxxxxxxxxx### Marching cubes and vtk visualizationIn addition to 2D sections we can extract surfaces to visualize in 3Drenderers. Surfaces can be visualized as 3D triangle complexes in VTK(see function plot\_surfaces\_3D below). To create these triangles, weneed to extract respective vertices and simplices from the potentialfields of lithologies and faults. This process is automatized in GemPywith the function ``get_surface``\ .In addition to 2D sections we can extract surfaces to visualize in 3D
renderers. Surfaces can be visualized as 3D triangle complexes in VTK
(see function plot_surfaces_3D below). To create these triangles, we
need to extract respective vertices and simplices from the potential
fields of lithologies and faults. This process is automatized in GemPy
with the function get_surface\ .
ver, sim = gp.get_surfaces(geo_model)gpv = gp.plot_3d(geo_model, image=False, plotter_type='basic')xxxxxxxxxxxxxxxxxxxxUsing the rescaled interpolation data, we can also run our 3D VTKvisualization in an interactive mode which allows us to alter and updateour model in real time. Similarly to the interactive 3D visualization ofour input data, the changes are permanently saved (in the``InterpolationInput.dataframe`` object). Additionally, the resulting changesin the geological models are re-computed in real time.Using the rescaled interpolation data, we can also run our 3D VTK
visualization in an interactive mode which allows us to alter and update
our model in real time. Similarly to the interactive 3D visualization of
our input data, the changes are permanently saved (in the
InterpolationInput.dataframe object). Additionally, the resulting changes
in the geological models are re-computed in real time.
geo_model.set_topography(d_z=(350, 750))gp.compute_model(geo_model)gp.plot_2d(geo_model, show_topography=True)plt.show()# sphinx_gallery_thumbnail_number = 9gpv = gp.plot_3d(geo_model, plotter_type='basic', show_topography=True, show_surfaces=True, show_lith=True, image=False)xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx### Compute at a given locationThis is done by modifying the grid to a custom grid and recomputing.Notice that the results are given as *grid + surfaces\_points\_ref +surface\_points\_rest locations*This is done by modifying the grid to a custom grid and recomputing. Notice that the results are given as grid + surfaces_points_ref + surface_points_rest locations
x_i = np.array([[3, 5, 6]])sol = gp.compute_model(geo_model, at=x_i)xxxxxxxxxxTherefore if we just want the value at **x\_i**:Therefore if we just want the value at x_i:
sol.customxxxxxxxxxxThis return the id, and the scalar field values for each seriesThis return the id, and the scalar field values for each series